Evaluation of Re-identification Risk for Anonymized Clinical Documents

نویسندگان

  • Parveen Kumar
  • Rajan Sareen
چکیده

Clinical data sharing for data transparency has potential to strengthen academic research, the practice of medicine and the integrity of clinical trial systems. This topic is becoming popular nowadays amongst leading pharmaceutical companies, and it raises concerns around the potential leak of personal health information of patients participating in clinical trials. The European Medicines Agency (EMA) has published phase 1 of their policy 0070 around requirements for anonymization of clinical documents for all studies submitted to EMA for market authorization, signaling a commitment towards data sharing, and increasing the urgency for addressing risk of potential reidentification. This paper will be discussing different scenarios in anonymization of clinical documents, implications of re-identification risk and the need for automation. INTRODUCTION It is not only EMA that is moving towards clinical data sharing but other associations/big pharmaceutical companies are trying to make data sharing mandatory. Pharmaceutical companies have started submitting anonymized clinical documents to the EMA with redaction of any text meeting the criteria of personal information or Commercially Confidential Information (CCI). Clinical documents in this paper is meant for all the reports that needs to be submitted under phase 1 policy of Policy 0070. In July 2013, Pharmaceutical Researchers and Manufacturers of America (PhRMA) & European Federation of Pharmaceutical Industries and Associations (EFPIA) member companies demonstrated commitment to share complete Clinical Study Reports (CSRs) along with Individual Patient Data (IPD) with qualified scientific and medical researchers, as necessary to conduct legitimate research. Recently in June 2017, International Committee of Medical Journal Editors (ICMJE) have introduced a requirement for data sharing statements for clinical trials before publishing any articles in journals abiding to ICMJE. However, in this paper, we shall be mostly interested in discussing about risk assessment methods for submission of clinical documents under EMA Policy 0070. Data redaction may lead to compromised data utility; hence the EMA is interested to approach to alternative techniques for data anonymization, including replacement, generalization, date offsetting, etc., with a more clearly established method for calculating quantitative risk of re-identification. This needs lots of efforts and resources to complete it manually. Industry needs an automated process of anonymizing clinical documents like study reports and Clinical Narratives. Anonymization methodologies must include a way of measuring re-identification risk and have a repeatable process to follow. Submissions done at EMA under phase 1 of policy 0070 have qualitative risk assessments. This paper will be discussing different ways of risk assessment methods suggested in literature for anonymized clinical documents.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Small Data Privacy Protection: An Exploration of the Utility of Anonymized Data of People with Rare Diseases

Sociotechnical researchers have recently begun studying people with rare diseases. There is potential for impact if data can be anonymized and shared so additional research can take place. However, this data also presents a high risk of re-identification because of the rarity of the diseases. Using existing research on data protection techniques, we generate an anonymized version of a rare dise...

متن کامل

Re-identification of home addresses from spatial locations anonymized by Gaussian skew

BACKGROUND Knowledge of the geographical locations of individuals is fundamental to the practice of spatial epidemiology. One approach to preserving the privacy of individual-level addresses in a data set is to de-identify the data using a non-deterministic blurring algorithm that shifts the geocoded values. We investigate a vulnerability in this approach which enables an adversary to re-identi...

متن کامل

Evaluation of the Predictive Value of Umbilical Cord Serum Bilirubin Level for the Development of Subsequent Hyperbilirubinemia in Term and Late-Preterm Neonates

Background: Considering the increasing rates of early hospital discharge and kernicterus in healthy full term newborns, timely identification of neonates at risk of severe hyperbilirubinemia is of great significance. The aim of this study was to investigate the predictive value of umbilical cord serum (UCS) bilirubin level for subsequent hyperbilirubinemia. Moreover, we compared the predictive ...

متن کامل

Protecting Privacy Using k-Anonymity

Objective: There is increasing pressure to share health information and even make it publicly availab However, such disclosures of personal health information raise serious privacy concerns. To alleviate such concerns, it is possible to anonymize the data before disclosure. One popular anonymization approach is kanonymity. There have been no evaluations of the actual re-identification probabili...

متن کامل

Is it possible to recover personal health information from an automatically de-identified corpus of French EHRs?

De-identification aims at preserving patient confidentiality while enabling the use of clinical documents for furthering medical research. Herein, we aim to evaluate whether patient re-identification is possible on a corpus of de-identified clinical documents in French. Personal Health Identifiers are automatically marked by a de-identification system applied to the corpus, followed by reintrod...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017